Iteration for Factored MDPsDaphne

نویسندگان

  • Daphne Koller
  • Ronald Parr
چکیده

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restricted basis functions, each of which refers only to a small subset of variables. An approximate fac-tored value function for a particular policy can be computed using approximate dynamic programming , but this approach (and others) can only produce an approximation relative to a distance metric which is weighted by the stationary distribution of the current policy. This type of weighted projection is ill-suited to policy improvement. We present a new approach to value determination, that uses a simple closed-form computation to compute a least-squares decomposed approximation to the value function for any weights directly. We then use this value determination algorithm as a subroutine in a policy iteration process. We show that, under reasonable restrictions, the policies induced by a factored value function can be compactly represented as a decision list, and can be manipulated eeciently in a policy iteration process. We also present a method for computing error bounds for decomposed value functions using a variable-elimination algorithm for function optimization. The complexity of all of our algorithms depends on the factorization of the system dynamics and of the approximate value function.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing real low-rank solutions of Sylvester equations by the factored ADI method

We investigate the factored ADI iteration for large and sparse Sylvester equations. A novel low-rank expression for the associated Sylvester residual is established which enables cheap computations of the residual norm along the iteration, and which yields a reformulated factored ADI iteration. The application to generalized Sylvester equations is considered as well. We also discuss the efficie...

متن کامل

Symbolic Opportunistic Policy Iteration for Factored-Action MDPs

This paper addresses the scalability of symbolic planning under uncertainty with factored states and actions. Our first contribution is a symbolic implementation of Modified Policy Iteration (MPI) for factored actions that views policy evaluation as policy-constrained value iteration (VI). Unfortunately, a naı̈ve approach to enforce policy constraints can lead to large memory requirements, somet...

متن کامل

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-squares projection operator is modified so that it does not increase max-norm, and thus preserves convergence. The other modification is that we un...

متن کامل

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not re­ tain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restr icted basis functions, each of which refers only to a small subset ...

متن کامل

The Factored Frontier Algorithm for Approximate Inference in DBNs

The Factored Frontier (FF) algorithm is a simple approximate inference algorithm for Dynamic Bayesian Networks (DBNs). It is very similar to the fully factorized version of the Boyen-Koller (BK) algorithm, but in­ stead of doing an exact update at every step followed by marginalisation (projection), it always works with factored distributions. Hence it can be applied to models for which the exa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000